AFRL-SN-HS-TR-2006-0040 


PHENOMENOLOGY-BASED  INVERSE  SCATTERING  FOR  SENSOR 
INFORMATION  FUSION 


Kung-Hau  Ding 


15  September  2006 
Final  Report 


Approved  for  Public  Release;  Distribution  Unlimited. 


AIR  FORCE  RESEARCH  LABORATORY 
Sensors  Directorate 
Electromagnetics  Technology  Division 
80  Scott  Drive 

Hanscom  AFB  MA  01731-2909 


TECHNICAL  REPORT 


Title:  Phenomenology-Based  Inverse  Scattering  for  Sensors  Information  Fusion 


Unlimited,  Statement  A 


NOTICE 

USING  GOVERNMENT  DRAWINGS,  SPECIFICATIONS,  OR  OTHER  DATA 
INCLUDED  IN  THIS  DOCUMENT  FOR  ANY  PURPOSE  OTHER  THAN  GOVERNMENT 
PROCUREMENT  DOES  NOT  IN  ANY  WAY  OBLIGATE  THE  US  GOVERNMENT.  THE 
FACT  THAT  THE  GOVERNMENT  FORMULATED  OR  SUPPLIED  THE  DRAWINGS, 
SPECIFICATIONS,  OR  OTHER  DATA  DOES  NOT  LICENSE  THE  HOLDER  OR  ANY 
OTHER  PERSON  OR  CORPORATION;  OR  CONVEY  ANY  RIGHTS  OR  PERMISSION 
TO  MANUFACTURE,  USE,  OR  SELL  ANY  PATENTED  INVENTION  THAT  MAY 
RELATE  TO  THEM. 


THIS  TECHNICAL  REPORT  HAS  BEEN  REVIEWED  AND  IS  APPROVED  FOR 
PUBLICATION. 


_ /signature/ _ 

Kung-Hau  Ding,  Monitor 


_ /signature/ _ 

Bertus  Weijers,  Branch  Chief 
ElectromagneticScattering  Technology 


_ /signature/ _ 

Michael  N.  Alexander 
Technical  Advisor 

Electromagnetic  Technology  Division 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  this  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including 
suggestions  for  reducing  this  burden  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704-0188),  1215  Jefferson  Davis  Highway, 
Suite  1204,  Arlington,  VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  penalty  for  failing  to  comply  with  a  collection  of 
information  if  it  does  not  display  a  currently  valid  OMB  control  number.  PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1 .  REPORT  DATE  (DD-MM-YYYY)  2.  FINAL  REPORT  3.  DATES  COVERED  (From  -  To) 

15  Sep  2006  FINAL  REPORT  1  Oct  2001  -  31  Dec  2005 


4.  TITLE  AND  SUBTITLE  5a.  CONTRACT  NUMBER 

In-House 


Phenomenology-Based  Inverse  Scattering  for  Sensor  Information  Fusion 


6.  AUTHOR(S) 


Kung-Hau  Ding 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 


AFRL/SNHE 

Air  Force  Research  Laboratory 
80  Scott  Drive 

Hanscom  AFB  MA  01731-2909 


5b.  GRANT  NUMBER 

N/A 


5c.  PROGRAM  ELEMENT  NUMBER 

61102F 


5d.  PROJECT  NUMBER 

2304 


5e.  TASK  NUMBER 

HE 


5f.  WORK  UNIT  NUMBER 

02 


8.  PERFORMING  ORGANIZATION 
REPORT 


9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Electromagnetics  Technology  Division 
Sensors  Directorate 
Air  Force  Research  Laboratory 
80  Scott  Drive 

Hanscom  AFB  MA  01731-2909 


12.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 


10.  SPONSOR/MONITOR’S  ACRONYM(S) 

AFRL-SN-HS 


11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

AFRL-SN-HS-TR-2006-0040 


Distribution  A.  Approved  for  Public  Release;  Distribution  Unlimited.  ESC/PA,  ESC  06-1097 


14.  ABSTRACT 

Fusion  of  sensor  and  communication  data  currently  can  only  be  performed  at  a  late  processing  stage  after  sensor  and 
textual  information  are  formulated  as  logical  statements  at  appropriately  high  level  of  abstraction.  Contrary  to  this  it 
seems,  the  human  mind  integrates  sensor  and  language  signals  seamlessly,  before  signals  are  understood,  at  pre- 
conceptual  level.  Learning  of  conceptual  contents  of  the  surrounding  world  depends  on  language  and  vice  versa.  This 
paper  uses  phenomenology  of  the  human  mind  and  sensing  processes  to  achieve  high-level  sensor  fusion.  We  describe 
a  phenomenology-based  inverse  scattering  (PBIS)  mathematical  technique  for  such  integration.  It  combines  PBIS  based 
on  dynamic  logic  with  dual  cognitive-language  models.  The  paper  briefly  discusses  relationships  between  the  proposed 
mathematical  technique,  working  of  the  mind  and  applications  to  understanding-based  search  engines. 


15.  SUBJECT  TERMS 

inverse  scattering,  fusion,  sensors,  communications,  computational  linguistics,  mind,  symbols,  dynamic  logic,  emotions, 
concepts,  evolving  systems 


17.LIMITATION  18. NUMBER  19a.  NAME  OF  RESPONSIBLE  PERSON 

of  abstract  of  pages  Kung-Hau  Ding 

c.THIS  PAGE  19b.  TELEPHONE  NUMBER  (include  area  code) 

Unclassified  SAR  34  N/A 


16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 
Unclassifie 


b.  ABSTRACT 
Unclassified 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std.  Z39.18 


11 


CONTENTS 


1.  INTRODUCTION:  COMPUTERS  VS.  HUMAN  MIND . 1 

2.  THEORIES  OF  THE  MIND  AND  COMBINATORIAL  COMPLEXITY . 4 

3.  MIND:  CONCEPTS  AND  EMOTIONS . 6 

4.  PHENOMENOLOGY-BASED  INVERSE  SCATTERING  (PBIS) . 8 

4.1  INTERNAL  MODELS,  LEARNING,  AND  SIMILARITY . 8 

4.2  FUZZY  DYNAMIC  LOGIC  AND  MFT . 9 

4.3  IMAGE  RECOGNITION,  TRACKING  AND  SENSOR  FUSION . 12 

4.4  HIGH  LEVEL  FUSION . 15 

5.  INTEGRATING  LANGUAGE  AND  SENSOR  SIGNALS . 17 

5.1  LANGUAGE  MODELS . 17 

5.2  DYNAMIC  LOGIC  OF  QUALITATIVE  SETS . 18 

5.3  JOINT  LANGUAGE  AND  SENSOR  MODELS . 19 

6.  DISCUSSION . 21 

6.1  WHY  MIND,  INSTINCTS  AND  EMOTIONS? . 21 

6.2  MFT  DYNAMICS . 21 

6.3  ELEMENTARY  THOUGHT-PROCESS,  CONSCIOUS,  AND  UNCONSCIOUS . 22 

6.4  IMAGINATION  AND  COGNITION . 22 

6.5  MIND  VS.  BRAIN . 23 

6.6  INSTINCTS  AND  EMOTIONS . 23 

6.7  AESTHETIC  EMOTIONS  AND  INSTINCT  FOR  KNOWLEDGE . 23 

6.8  COGNITION,  SIGNS  AND  SYMBOLS . 24 

7.  SUMMARY . 25 


iii 


REFERENCES 


26 


ACKNOWLEDGMENTS 


This  research  has  been  supported  by  Dr.  Arje  Nachman,  Air  Force  Office  of  Scientific 
Research.  The  author  is  thankful  to  Ross  Deming  and  Robert  Linnehan  for  helpful  discussions. 


IV 


1.  INTRODUCTION:  COMPUTERS  VS.  HUMAN  MIND 


Current  engineering  approaches  attempt  to  develop  computer  capabilities  for  language 
and  cognition  separately,  usually  in  different  organizations.  Nature  does  it  differently.  A  child 
develops  both  capabilities  jointly.  We  do  not  know  if  it  is  possible  to  code  computers  to  be 
‘cognitive’  or  ‘language  capable’,  one  capability  separately  from  the  other.  Current  approaches 
could  be  invalid  in  principle.  These  considerations  are  prime  motivations  for  this  report.  Let  us 
examine  them  in  some  detail. 

Consider  a  most  influential  JDL  fusion  model  [*].  It  is  a  functional  model  of  a  fusion 
process  with  several  levels.  In  the  1999  revision,  the  model  included  five  levels  (from  level  0  to 
level  4):  sub-object,  object,  situation,  impact,  and  refinement.  Further  enhancements  of  the 
model  considered  additional  levels,  e.g.  [2].  Dasarathy  proposed  a  sensor  fusion  model  with  three 
processing  levels:  the  data  level,  the  feature  level,  and  the  decision  level  [3].  Endsley  suggested  a 
model  with  three  levels  of  mental  representation  needed  for  situation  awareness:  perception, 
comprehension,  and  projection  [4].  This  was  extended  by  adding  a  “resolution”  level,  generating 
behavior  to  achieve  the  desired  outcome  [5],  A  situational  awareness  framework  unifying  JDL 
and  Endsley ’s  models  was  developed  in  [6] .  Practical  implementations  of  high-level  fusion 
(levels  2,  3,  and  beyond)  require  development  of  detailed  models  with  the  appropriate  degree  of 
abstractness  for  every  level.  Natural-language  type  communications  are  considered  necessary  (or 
at  least  desirable)  at  these  high  levels.  However,  these  high  level  fusion  and  communication  in 
contemporary  systems  lack  the  flexibility  of  human  cognition  and  natural  languages.  To  achieve 
fusion  and  semantic  integration  at  high  fusion  levels  (level  2  or  3  and  beyond),  developers  rely 
on  models,  ontologies,  and  protocols,  which  assume  shared  knowledge  and  understanding  [7],  In 
practice,  structures  of  these  models  have  to  be  fixed.  This  is  also  true  for  ontologies  being 
developed  for  semantic  web.  They  cannot  be  as  flexible  as  “shared  knowledge”  necessary  for 
understanding  among  people.  Specific  mathematical  reason  for  this  inflexibility  we  discuss  in 
section  2. 

As  the  physical  infrastructure  for  communication  systems  and  the  Internet  matures,  the 
information  services  are  gaining  in  importance.  Distributed  data  fusion  integrated  with  flexible 
communication  would  be  necessary  for  the  future  Sensor  web,  an  integrated  operation  of 
multiple  platfonns  and  agents  with  sensors  and  communication  capabilities.  However,  computer 
systems  today  are  using  inflexible  models  and  ontologies.  They  can  integrate  signals  from 
sensors  with  language  communication  messages  only  at  a  high  cognitive  levels  of  logical 
predicates.  First,  information  has  to  be  extracted  from  sensor  signals  and  formulated  as  logical 
statements  at  the  appropriately  high  level  of  abstraction.  Similarly,  language  or  communication 
messages  have  to  be  pre-processed,  the  relevant  data  extracted  and  formulated  as  logical 
statements  at  a  similar  level  of  abstraction.  The  resulting  systems  are  brittle.  As  requirements  and 
hardware  are  changing,  they  become  obsolete. 

Contrary  to  the  brittleness  of  artificial  fusion  systems,  the  human  mind  improves  with 
experience.  We  discuss  in  this  paper  that  learning,  adaptive  and  self-evolving  capabilities  of  the 
mind  are  closely  related  to  the  ability  to  integrate  signals  subliminally.  For  example,  during 
everyday  conversations,  human  eye  gaze  as  well  as  visual  processing  stream  and  the  type  of 
conceptual  information  extracted  from  the  surrounding  world  are  affected  by  contents  of  speech, 
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even  before  it  is  fully  processed  and  conceptually  analyzed.  Similarly,  speech  perception  is 
affected  by  concurrent  cognitive  processing.  To  some  extent,  we  see  what  we  expect  to  see; 
verbal  preconditioning  affects  cognition,  and  vice  versa.  This  close,  pre-conceptual  integration  of 
language  and  cognition  is  important  not  only  in  real-time  perception  and  cognition,  but  also  in 
ontogenesis,  during  child  growing  up,  as  well  as  in  evolution  of  culture  and  language.  As  we 
attempt  to  develop  intelligent  systems,  these  lessons  from  biological  systems  and  their  evolution 
should  be  taken  into  account. 

Developing  computer  systems  for  fusion  of  language  and  cognition  might  seem 
premature.  Even  considered  separately,  these  problems  are  very  complex  and  far  from  being 
solved.  Our  systems  for  recognition,  tracking,  and  fusion  using  sensor  data  often  fall  far  short  of 
human  abilities.  Similarly,  our  computer  communication  systems  lack  the  flexibility  of  language. 
Natural  language  understanding  remains  a  distant  goal.  Let  me  repeat  that  the  only  way  two 
computers  can  communicate  at  all  is  due  to  fixed  protocols.  Communications  among  computers 
are  intended  for  human  users.  Computers  do  not  understand  contents  of  communication 
messages,  except  within  narrow  domains.  Everyone  knows  the  frustration  of  searching 
information  on  the  Internet;  Google  and  Yahoo  do  not  understand  our  language.  But,  why  should 
we  hope  to  achieve  progress  in  fusing  two  capabilities,  neither  of  which  is  at  hand? 

The  answer  was  given  at  the  beginning  of  the  paper.  The  only  system  that  we  know 
capable  of  human  level  cognition  and  communication  is  the  human  mind.  An  individual  human 
mind  develops  both  capabilities  in  ontogenesis,  during  childhood,  jointly.  This  is  opposite  to 
current  engineering  approaches,  which  attempt  to  develop  these  capabilities  separately,  usually 
in  different  scientific  and  engineering  organizations.  It  is  quite  possible  that  coding  a  computer 
to  acquire  language  and  cognitive  abilities  similarly  to  the  human  ways  is  an  ‘easier’  task,  and 
may  possibly  be  the  only  way  to  go.  We  do  not  even  know  if  it  is  possible  to  code  computers  to 
be  ‘cognitive’  or  ‘language  capable’,  one  capability  separately  from  the  other.  These  current 
approaches  could  be  invalid  in  principle. 

A  similar  argument  is  applicable  to  the  ‘initial’  computer  code,  which  we  would  like  to 
be  similar  to  a  child’s  inborn  capabilities,  enabling  joint  learning  of  language  and  cognition. 
Humans  evolved  this  capability  over  at  least  two  million  years.  It  is  possible,  that  simulating  an 
accelerated  evolution  is  an  ‘easier’  scientific  and  engineering  approach,  than  ‘direct  coding’  into 
a  computer  of  the  current  state  of  human  baby  mind.  Moreover,  we  do  not  need  to  have  to 
simulate  the  evolution  of  culture;  computers  may  leam  from  humans  in  collaborative  human- 
computer  environment.  Therefore,  along  with  smart  heuristic  solutions,  we  should  try  to  uncover 
natural  mechanisms  of  evolving  language  and  culture,  and  to  develop  mathematical  descriptions 
for  these  processes. 

Close  relationships  between  language  and  cognition  encouraged  equating  these  abilities 
in  the  past.  Rule-based  systems  and  mathematics  of  logic  implied  significant  similarities  between 
the  two:  Thoughts,  words,  and  phrases,  all  are  logical  statements.  The  situation  has  changed,  in 
part  due  to  the  fact  that  logic-rule  systems  have  not  been  sufficiently  powerful  to  explain 
cognition,  nor  language  abilities,  and  in  part  due  to  improved  scientific  understanding 
(psychological,  cognitive,  neural,  linguistic)  of  the  mechanisms  involved.  Contemporary 
linguists  appreciate  that  language  and  cognition  could  be  distinct  and  different  abilities  of  the 
mind  [see8  for  further  references]. 
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Language  mechanisms  of  our  mind  include  abilities  to  acquire  a  large  vocabulary,  rules 
of  grammar,  and  to  use  the  finite  set  of  words  and  rules  to  generate  virtually  infinite  number  of 
phrases  and  sentences  [9’ 10],  Cognition  includes  abilities  to  understand  the  surrounding  world  in 
terms  of  objects,  their  relationships  (scenes  and  situations),  relationships  among  relationships, 
and  so  on  [n],  Researchers  in  computational  linguistics,  mathematics  of  intelligence  and  neural 
networks,  cognitive  science,  neuro-physiology  and  psychology  during  the  last  twenty  years 
significantly  advanced  understanding  of  the  mechanisms  of  the  mind  involved  in  learning  and 
using  language,  mechanisms  of  perception  and  cognition  [9'10,11’12’13,14].  Much  less  advance  was 
achieved  toward  deciphering  mechanisms  relating  linguistic  competence  to  cognition  and 
understanding  the  world.  Although  it  seems  clear  that  language  and  cognition  are  closely  related 
abilities,  intertwined  in  evolution,  ontogenesis,  and  everyday  use,  still  the  currently  understood 
mechanisms  of  language  are  mainly  limited  to  relations  of  words  to  other  words  and  phrases,  but 
not  to  the  objects  in  the  surrounding  world,  not  to  cognition  and  thinking.  Possible  mathematical 
approaches  toward  integrating  language  and  thinking,  words  and  objects,  phrases  and  situations 
are  discussed  in  this  paper. 

The  paper  starts  with  a  mathematical  description  of  cognition,  which  still  is  an  issue  of 
much  controversy.  Among  researchers  in  mathematical  intelligence  it  has  become  appreciated, 
especially  during  the  last  decades  that  cognition  is  not  just  a  chain  of  logical  inferences  [1L14]. 
Yet,  mathematical  methods  describing  cognition  as  processes  involving  concepts,  instincts, 
emotions,  memory,  imagination  are  not  well  known,  although  significant  progress  in  this 
direction  was  achieved  ['  ,14,15].  A  brief  historical  overview  of  this  area  including  difficulties  and 
controversies  is  given  in  the  next  two  sections  from  mathematical,  psychological,  and  neural 
standpoints.  It  is  followed  by  a  mathematical  description  of  cognitive  processes,  including 
recognition,  tracking,  and  fusion  as  variations  of  the  same  basic  paradigm.  Then  the  paper 
discusses  the  ways  in  which  the  mathematical  description  of  cognition  can  be  combined  with 
language,  taking  advantage  of  recent  progress  in  computational  linguistics.  It  touches  upon  novel 
ideas  of  computational  semiotics  relating  language  and  cognition  through  signs  and  symbols. 
Approaches  to  building  hierarchy  of  high-level  fusion  (including  levels  2,  3,  and  beyond)  are 
discussed. 

In  conclusion,  I  briefly  touch  on  relationships  between  mathematical,  psychological,  and 
neural  descriptions  of  cognitive  processes  and  language  as  parts  of  the  mind.  Words  like  mind, 
thought,  imagination,  emotion,  concept  are  often  used  colloquially  in  many  ways,  but  their  use  in 
science  and  especially  in  mathematics  of  intelligence  has  not  been  uniquely  defined  and  is  a 
subject  of  active  research  and  ongoing  debates  [l4,15].  According  to  a  dictionary  [16],  the  mind 
includes  conscious  and  unconscious  processes,  especially  thought,  perception,  emotion,  will, 
memory,  and  imagination,  and  it  originates  in  brain.  These  constituent  notions  will  be  briefly 
discussed  throughout  the  paper.  It  turns  out  that,  far  from  being  esoteric  abilities  far  removed 
from  engineering  applications,  these  abilities  are  inseparable  from  a  mathematical  description  of 
even  simplest  cognition  processes.  Their  understanding  is  helpful  for  developing  high-level 
fusion  systems. 
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2.  THEORIES  OF  THE  MIND  AND  COMBINATORIAL  COMPLEXITY 

Understanding  signals  coming  from  sensory  organs  involves  associating  subsets  of 
signals  corresponding  to  particular  objects  with  internal  representations  of  these  objects.  This 
leads  to  recognition  of  the  objects  and  activates  internal  brain  signals  leading  to  mental  and 
behavioral  responses,  which  constitute  the  understanding  of  the  meaning  (of  the  objects). 

Developing  mathematical  descriptions  of  the  very  first  recognition  step  of  this  seemingly 
simple  association-recognition-understanding  process  has  not  been  easy,  and  a  number  of 
difficulties  have  been  encountered  during  the  past  fifty  years.  These  difficulties  have  been 
summarized  under  the  notion  of  combinatorial  complexity  (CC)  [17].  The  problem  was  first 
identified  in  pattern  recognition  and  classification  research  in  the  1960s  and  was  named  “the 

i  o 

curse  of  dimensionality”  [  ].  The  following  thirty  years  of  developing  adaptive  statistical  pattern 
recognition  and  neural  network  algorithms  designed  for  self-learning  led  to  the  conclusion  that 
these  approaches  often  encountered  CC  of  learning  requirements:  recognition  of  any  object,  it 
seemed,  could  be  learned  if  “enough”  training  examples  were  used  for  an  algorithm  self¬ 
learning.  The  required  examples  had  to  account  for  all  possible  variations  of  “an  object”,  in  all 
possible  geometric  positions  and  in  combinations  with  other  objects,  sources  of  light,  etc., 
leading  to  astronomical  (and  worse)  numbers  of  required  examples. 

By  the  end  of  the  1960s  a  different  paradigm  became  popular:  logic-rule  systems  (or 
expert  systems)  were  proposed  to  solve  the  problem  of  learning  complexity.  According  to 
Minsky,  rules  were  to  capture  the  required  knowledge  and  eliminate  a  need  for  learning  [19]. 
Similar  were  the  first  Chomsky  ideas  concerning  mechanisms  of  language  grammar  related  to 
deep  structure  [  ];  they  also  used  mechanisms  of  logical  rules.  Rule  systems  work  well  when  all 
aspects  of  the  problem  can  be  predetermined  and  there  are  no  unexpected  variabilities.  However, 
rule  systems  in  presence  of  unexpected  variability,  encountered  CC  of  rules:  more  and  more 
detailed  sub-rules  and  sub-sub-rules,  one  contingent  on  another,  had  to  be  specified. 

In  the  1980s  model-based  systems  became  popular.  They  were  proposed  to  combine 
advantages  of  adaptivity  and  learning  with  rules  by  utilizing  adaptive  models.  Existing 
knowledge  was  to  be  encapsulated  in  models  and  unknown  aspects  of  concrete  situations  were  to 
be  described  by  adaptive  parameters.  Along  similar  lines  were  rules  and  parameters  ideas  of 
Chomsky  [21],  Model-based  systems  encountered  computational  CC  (N  and  NP  complete 
algorithms).  The  reason  was  that  considered  algorithms  had  to  evaluate  multiple  combinations  of 
elements  of  data  and  rules  (models).  CC  is  prohibitive  because  the  number  of  combinations  is 
very  large:  for  example,  consider  100  elements  (not  too  large  a  number)  which  combinations  had 
to  be  evaluated;  the  number  of  combinations  of  100  elements  is  1 00 10°,  a  number  comparable  to 
the  number  of  elementary  particles  in  a  Universe;  no  computer  would  ever  be  able  to  compute 
that  many  combinations.  The  CC  became  a  ubiquitous  feature  of  intelligent  algorithms  and 
seemingly,  a  fundamental  mathematical  limitation. 

Combinatorial  complexity  was  related  to  the  type  of  logic,  underlying  various  algorithms 
and  neural  networks  [17].  Formal  logic  is  based  on  the  “law  of  excluded  third,”  according  to 
which  every  statement  is  either  true  or  false  and  nothing  in  between.  Therefore,  algorithms  based 
on  formal  logic  have  to  evaluate  every  little  variation  in  data  or  internal  representations  as  a 
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separate  logical  statement;  a  large  number  of  combinations  of  these  variations  cause 
combinatorial  complexity.  In  fact,  combinatorial  complexity  of  algorithms  based  on  logic  has 
been  related  to  the  Godel  theory:  It  is  a  finite  system  manifestation  of  the  incompleteness  of 
logic  [22].  Multivalued  logic  and  fuzzy  logic  were  proposed  to  overcome  limitations  related  to 
the  law  of  excluded  third  [23].  Yet  the  mathematics  of  multivalued  logic  is  no  different  in 
principle  from  formal  logic.  Fuzzy  logic  encountered  a  difficulty  related  to  the  degree  of 
fuzziness:  If  too  much  fuzziness  is  specified,  the  solution  does  not  achieve  a  needed  accuracy,  if 
too  little,  it  becomes  similar  to  formal  logic. 

Various  approaches  to  fusion  can  be  related  to  mathematical  methods  considered  above. 
For  example,  an  influential  and  general  method  of  Multiple  Hypothesis  Testing  (MHT),  and 
closely  related  Multiple  Hypothesis  Tracking,  are  model-based  methods.  Their  combinatorial 
complexity  is  widely  appreciated.  MHT  for  tracking  is  often  used  at  the  object  level  1.  MHT 
using  hypotheses  and  models  of  situations  are  often  used  for  fusing  at  higher  levels  2,  3  and 
beyond.  Combinatorial  complexity  prevents  these  mathematical  methods  from  achieving  human¬ 
like  flexibility  and  adaptivity.  Yet  general  methods  for  high-level  fusion  overcoming  CC  were 
not  developed.  In  presence  of  variability,  the  most  difficult  aspect  of  fusion,  it  seems,  is  fusion  of 
signals  with  knowledge,  or  fusion  of  lower-level  knowledge  with  higher-level  knowledge.  The 
reason  is  a  need  to  ‘fit’  higher-level  models  to  objects  and  situations  identified  at  lower  levels. 
This  requires  testing  of  multiple  combinations  and  leads  to  CC.  In  section  4  we  discuss  a 
biologically  inspired  mathematical  approach  to  fusion,  which  overcomes  CC.  The  biological 
inspirations  for  this  approach  are  briefly  summarized  in  the  next  section  3. 
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3.  MIND:  CONCEPTS  AND  EMOTIONS 


The  seemingly  fundamental  nature  of  mathematical  difficulties  discussed  above  has  led 
many  to  believe  that  classical  physics  cannot  explain  the  working  of  the  mind.  Yet,  I  would  like 
to  emphasize  another  aspect  of  the  problem:  often  mathematical  theories  of  the  mind  where 
proposed  before  the  necessary  physical  intuition  of  how  the  mind  works  was  developed.  Newton, 
as  often  mentioned,  did  not  consider  himself  as  evaluating  various  hypotheses  about  the  working 
of  the  material  world,  he  felt  that  he  had  what  we  call  today  a  physical  intuition  about  the  world 
[24].  An  intuition  about  the  mind  points  to  mechanisms  of  concepts,  emotions,  instincts, 
imagination,  behavior,  consciousness,  and  unconscious  [  ].  An  essential  role  of  emotions  in  the 
working  of  the  mind  was  analyzed  from  the  psychological  and  neural  perspective  by  Grossberg 
[26],  from  the  neuro-physiological  perspective  by  Damasio  [27],  and  from  the  learning  and  control 
perspective  by  the  author  [  ’  ’  ].  One  reason  for  the  engineering  community  being  slow  in 
adopting  these  results  is  the  cultural  bias  against  emotions  as  a  part  of  cognitive  processes.  Plato 
and  Aristotle  thought  that  emotions  are  “bad”  for  intelligence,  this  is  a  part  of  our  cultural 
heritage  (“one  have  to  be  cool  to  be  smart”),  and  the  founders  of  Artificial  Intelligence  repeated 
this  truism  about  emotions  [''].  Yet,  as  discussed  in  the  next  section,  combining  conceptual 
understanding  with  emotional  evaluations  is  crucial  for  overcoming  the  combinatorial 
complexity  as  well  as  related  difficulties  of  logic. 

Let  me  summarize  briefly  and  in  a  much  simplified  way  several  aspects  of  the  working  of 
the  mind,  which  seem  essential  to  the  development  of  the  mathematical  descriptions  of  the  mind 
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mechanisms:  instincts,  concepts,  emotions,  behavior  [  ’  ]. 

The  most  accessible  to  our  consciousness  mechanism  of  the  mind  is  concepts:  the  mind 
operates  with  concepts.  Concepts  are  like  internal  models  of  the  objects  and  situations.  This 
analogy  is  quite  literal,  e.g.,  during  visual  perception  of  an  object,  an  internal  concept-model 
projects  an  image  onto  the  visual  cortex,  which  is  matched  there  to  an  image  projected  from 
retina  (this  simplified  description  will  be  refined  later).  Concept  mechanism  evolved  for  the 
purpose  of  survival,  and  therefore  it  serves  for  a  better  satisfaction  of  the  basic  instincts,  which 
have  emerged  as  survival  mechanisms  even  before  the  mind.  Instincts  operate  like  internal 
sensors:  for  example,  when  a  sugar  level  in  blood  goes  below  a  certain  level  an  instinct  “tells  us” 
to  eat. 


We  do  not  read  ‘instinctual  sensor  readings.’  Satisfaction  or  dissatisfaction  of  instincts  is 
measured  by  emotions  that  we  feel.  Emotions  are  neural  signals  connecting  instinctual  and 
conceptual  brain  regions.  Whereas  in  colloquial  usage,  emotions  are  often  understood  as  facial 
expressions,  higher  voice  pitch,  exaggerated  gesticulation,  these  are  the  outward  signs  of 
emotions,  serving  for  communication.  A  more  fundamental  role  of  emotions  within  the  mind 
system  is  that  emotional  signals  evaluate  concepts  for  the  purpose  of  instinct  satisfaction.  This 
evaluation  is  not  according  to  rules  or  concepts  (like  in  rule-systems  of  artificial  intelligence), 
but  according  to  a  different  instinctual-emotional  mechanism  [,4].  Specific  instinct  and  emotions 
related  to  learning  [25]  and  their  mathematical  mechanisms  are  described  in  the  next  section.  This 
instinctual-emotional  mechanism  is  crucial  for  breaking  out  of  the  “vicious  circle”  of 
combinatorial  complexity. 
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The  results  of  conceptual-emotional  understanding  of  the  world  are  actions  (or  behavior) 
in  the  outside  world  or  within  the  mind.  In  this  paper  we  touch  on  only  one  type  of  behavior,  the 
behavior  of  improving  understanding  and  knowledge  of  language  and  the  world.  In  the  next 
section  we  describe  a  mathematical  theory  of  a  ‘simple’  conceptual-emotional  recognition  and 
understanding.  This  includes  tracking  and  sensor  fusion.  As  we  will  discuss,  in  addition  to 
concepts  and  emotions,  it  involves  with  necessity  mechanisms  of  intuition,  imagination, 
conscious,  and  unconscious.  And  this  process  is  intimately  connected  to  an  ability  of  the  mind  to 
form  symbols  and  interpret  signs. 

The  mind  involves  a  hierarchy  of  multiple  levels  of  concept-models,  from  simple 
perceptual  elements  (like  edges,  or  moving  dots),  to  concept-models  of  objects,  to  complex 
scenes,  and  up  the  hierarchy...  toward  the  concept-models  of  the  meaning  of  life  and  purpose  of 
our  existence.  Hence  the  tremendous  complexity  of  the  mind,  yet  relatively  few  basic  principles 
of  the  mind  organization  go  a  long  way  explaining  this  system.  Development  of  sensor  fusion 
systems  does  not  require  most  general  and  abstract  models.  Situational  awareness,  for  example, 
requires  models  of  situations. 


7 


4.  PHENOMENOLOGY-BASED  INVERSE  SCATTERING  (PBIS) 
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Phenomenology-based  inverse  scattering  [  ]  is  implemented  in  this  paper  using 
modeling  field  theory  (MFT),  summarized  below.  It  is  a  biologically  inspired  multi-level 
intelligent  system.  MFT  architecture  is  based  on  the  phenomenology  of  human  mind.  In  addition, 
specific  models  within  MFT  use  phenomenology  of  the  appropriate  sensing  processes.  At  each 
level  MFT  associates  lower-level  signals  with  higher-level  concept-models  (representations, 
ontologies,  scattering  models),  resulting  in  understanding  of  signals,  while  overcoming  the 
difficulties  of  CC  [  '  ].  CC  is  overcome  by  using  a  new  type  of  logic,  fuzzy  dynamic  logic. 
Modeling  field  theory  is  a  hetero-hierarchical  system.  We  first  describe  a  basic  structure  of 
interaction  between  two  adjacent  hierarchical  levels  of  signals;  sometimes,  it  will  be  more 
convenient  to  talk  about  these  two  signal-levels  as  an  input  to  and  output  from  a  (single) 
processing-level. 

At  each  level,  input  comes  either  from  sensors  (at  the  lowest  level),  or  from  a  lower  level; 
output  signals  are  concepts  recognized  (or  formed)  in  input  signals.  Input  signals  X  are 
associated  with  (or  recognized,  or  grouped  into)  concepts  according  to  the  representations- 
models  and  similarity  measures  at  this  level.  In  the  process  of  association-recognition,  models 
are  adapted  for  better  representation  of  the  input  signals;  and  similarity  measures  are  adapted  so 
that  their  fuzziness  is  matched  to  the  model  uncertainty.  The  initial  uncertainty  of  models  is  high 
and  so  is  the  fuzziness  of  the  similarity  measure;  in  the  process  of  learning  models  become  more 
accurate  and  the  similarity  more  crisp,  the  value  of  the  similarity  measure  increases.  This 
mechanism  is  called  fuzzy  dynamic  logic,  or  dynamic  logic  for  short. 


4,1  INTERNAL  MODELS,  LEARNING,  AND  SIMILARITY 

During  the  learning  process,  new  associations  of  input  signals  with  concept-models  are 
formed  resulting  in  evolution  of  new  concepts.  Input  signals  are  denoted  {X(n)},  n  =  1,...  N; 
concept-models  {Mh(n)},  h  =  1,...  H,  predict  values  of  signals  X(n)  expected  from  object  (or 
situation)  h;  each  model  depends  on  its  parameters  {Ph},  Mh(Ph,n).  In  a  highly  simplified 
description  of  a  visual  cortex,  n  enumerates  the  visual  cortex  neurons,  X(n)  are  the  “bottom-up” 
activation  levels  of  these  neurons  coming  from  the  retina  through  visual  nerve,  and  Mh(n)  are  the 
“top-down”  activation  levels  (or  priming)  of  the  visual  cortex  neurons  from  previously  learned 
object-models  [36].  Learning  process  attempts  to  “match”  these  top-down  and  bottom-up 
activations  by  selecting  “best”  models  and  their  parameters.  Mathematically,  learning  increases  a 
similarity  measure  between  the  sets  of  models  and  signals,  L({X(n)},{Mh(n)}).  A  biological 
interpretation  is  that  similarity  maximization  is  the  instinct  for  knowledge,  for  improving 
correspondence  between  the  concept-models  and  the  world.  The  similarity  measure  is  a  function 
of  model  parameters  and  associations  between  the  input  signals  and  concepts-models.  It  is 
constructed  in  such  a  way  that  any  of  a  large  number  of  objects  can  be  recognized. 
Correspondingly,  a  similarity  measure  is  designed  so  that  it  treats  each  concept-model  as  an 
alternative  for  each  subset  of  signals 

L({X},{M})  =I1Z  r(h)  l(X(n)  |  Mh(n)).  (1) 

n  eN  heH 
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Here,  l(X(n)|Mh(n))  (or  simply  l(n|h))  is  a  conditional  partial  similarity  between  one  signal  X(n) 
and  one  model  Mh(n),  and  all  possible  combinations  of  signals  and  models  are  accounted  for  in 
this  expression.  Parameters  r(h)  are  proportional  to  the  number  of  signals  {n}  associated  with  the 
model  h  (in  statistics  they  are  called  priors).  Although,  (1)  contains  a  product  over  individual 
signal  samples,  n,  signal  samples  are  not  assumed  statistically  independent.  Inter-dependence  is 
defined  by  models  M.  Note,  (1)  contains  a  large  number  of  combinations  of  models  and  signals, 
all  possible  signal-model  associations,  a  total  of  HN  items.  This  is  a  cause  for  combinatorial 
complexity  of  many  algorithms  discussed  in  section  2. 

In  the  process  of  learning,  concept-models  are  constantly  modified.  From  time  to  time  a 
system  forms  a  new  concept,  while  retaining  an  old  one  as  well;  alternatively,  old  concepts  are 
sometimes  merged.  Formation  of  new  concepts  and  merging  of  old  ones  require  a  modification 
of  the  similarity  measure  (1);  the  reason  is  that  more  models  always  result  in  a  better  fit  between 
the  models  and  data.  This  is  a  well-known  problem,  it  can  be  addressed  by  reducing  (1)  using  a 
“penalty  function”,  p(N,M)  that  grows  with  the  number  of  models  M,  and  this  growth  is  steeper 
for  a  smaller  amount  of  data  N.  For  example,  an  asymptotically  unbiased  maximum  likelihood 
estimation  leads  to  multiplicative  p(N,M)  =  exp(-Npar/2),  where  Npar  is  a  total  number  of  adaptive 
parameters  in  all  models  (this  penalty  function  is  known  as  Akaike  Infonnation  Criterion,  see  [' '] 
for  further  discussion  and  references). 


4.2  FUZZY  DYNAMIC  LOGIC  AND  MFT 

The  learning  process  consists  in  estimating  model  parameters  Ph  and  associating  subsets 
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of  signals  with  concepts  by  maximizing  the  similarity  (1).  Fuzzy  dynamic  logic  [  ’  ’  ’  ]  is  an 
iterative  process,  which  solves  this  problem  without  combinatorial  complexity  as  follows.  The 
iterations  start  with  any  arbitrary  values  of  the  unknown  parameters  Ph.  The  next  step  is  to 
compute  fuzzy  association  variables  f(h|n) 

f(h|n)  =  r(h)l(n|h)/X  r(h’)  l(n|h’).  (2) 

h'  gH 

These  variables  give  a  measure  of  correspondence  between  signal  X(n)  and  model  Mh  relative  to 
all  other  models,  h'.  A  mechanism  of  concept  formation  and  learning,  a  dynamics  of  the 
modeling  fields  is  defined  as  follows, 


Ph  =  Ph  +  a  'Yj  f(hM  [31nl(n|h)/0Mh]SM  h/8 Ph, 

(3) 

r(h)  =  Nh/N;  Nh  =  £  f(h|n); 

(4) 

n 


Here,  parameter  a  detennines  the  iteration  step  and  speed  of  convergence  of  the  MF  system;  Nh 
can  be  interpreted  as  a  number  of  signals  X(n)  associated  with  (or  coming  from)  a  concept-object 
n.  Steps  (2),  (3)  and  (4)  are  repeated  iteratively,  until  convergence,  which  is  measured  by 
parameter  changes  falling  below  a  predefined  threshold.  As  already  mentioned,  in  the  MF 
internal  dynamics,  similarity  measures  are  adapted  so  that  their  fuzziness  is  matched  to  the 
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model  uncertainty.  Mathematically,  this  can  be  accomplished  in  several  ways,  depending  on  the 
specific  parameterization  of  the  conditional  partial  similarity  measures,  l(n|h);  for  example,  they 
can  be  defined  as  Gaussian  functions, 

l(n|h)  =  (Inf2  (detC  Jl/2  exp{-  0.5(X(n)  -  Mh(n))T  C,'1  (X(n)  -  Mh(n))  }.  (5) 

Here,  d  is  the  dimensionality  of  the  vectors  X  and  M,  and  Ch  is  a  covariance.  The  dynamics  of 
fuzziness  of  the  MF  similarity  measures  is  defined  as 

Ch  =  X  f(h|n)  (X(n)  -  Mh(n))(X(n)  -  Mh(n))  T/  Nh.  (6) 


Initially,  models  do  not  match  data;  any  data  sample  n  fits  equally  poorly  any  model  h,  and 
association  variables,  f(h|n),  take  homogeneous  values  across  the  data,  associating  all  concept- 
models  h  with  all  input  signals  n.  Correspondingly,  covariances  are  large,  encompassing  all  the 
data.  As  matching  improves,  covariances  become  smaller,  and  the  association  variables,  f(h|n), 
tend  to  high  values  1  for  some  subsets  of  signals  and  models  and  zero  for  others;  thus  certain 
concepts  get  associated  with  certain  subsets  of  signals  (objects  are  recognized  and  concepts 
formed).  The  following  theorem  was  proven  [n]. 

Theorem.  Equations  (2)  through  (6)  define  a  convergent  dynamic  system  MF  with 
stationary  states  given  by  max{phjX. 

In  plain  language  this  means  that  the  above  equations  indeed  result  in  concept-models  in 
the  “mind”  of  the  MFT  system,  which  are  most  similar  [in  terms  of  similarity  (1)]  to  the  sensor 
data.  Despite  a  combinatorially  large  number  of  items  in  (1),  the  computational  complexity  of 
the  MF  method  is  relatively  low,  it  is  linear  in  N  and  could  be  implemented  by  a  physical  system 
(like  computer  or  brain).  These  equations  describe  a  loop  system,  which  is  illustrated  in  the 
block-diagram  in  Fig.  1.  The  MFT  /  dynamic  logic  loop  sustains  its  operations  on  its  own;  the 
loop  is  not  closed  in  that  there  are  input  signals  into  the  loop  and  output  concepts  from  the  loop. 
This  theorem  is  proved  by  demonstrating  that  similarity  (1)  increases  at  each  iteration  step,  eqs. 
(3)  and  (4).  A  biological  interpretation  of  these  equations  is  that  they  satisfy  the  instinct  for 
knowledge,  therefore  they  are  positive  emotions.  MFT  adapting  to  data  according  to  dynamic 
logic,  ‘enjoys’  the  learning  process. 

Comment.  A  definition  (5)  of  conditional  partial  similarities  using  Gaussian  functions  can 
be  considered  a  basis  for  the  following  probabilistic  interpretation:  a  model  Mh(Ph,n)  is  a 
conditional  statistical  expectation  of  signals  from  object  h  described  by  parameters  Ph.  A 
similarity  measure  (1)  is  a  total  likelihood.  Let  me  emphasize  that  such  an  interpretation  could  be 
valid  if  for  some  values  of  the  parameters,  the  models  are  accurate  (that  is,  models  actually  are 
conditional  statistical  expectation).  If  models  are  approximate  in  a  non-statistic al  sense,  other 
similarity  measures  could  be  more  preferable  mathematically,  like  mutual  information  in  the 
models  about  the  data  ["].  I’d  also  like  to  emphasize  that  unlike  usual  “Gaussian  assumption,” 
MFT  structure  is  quite  general,  it  does  not  assume  that  the  signal  distribution  is  Gaussian,  but 
only  the  conditional  deviations  between  models  and  signals  are,  this  likelihood  can  represent  any 
statistical  distribution  ["]. 
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Fig.l.  For  a  single  level  of  MFT,  input  signals  are  unstructured  data  {X(n)}  and  output  signals 
are  recognized  or  formed  concepts  {h}  with  high  values  of  similarity  measures.  The  MFT 
equations  (2)  through  (6)  describe  a  continuous  loop  operation  involving  input  signals,  similarity 
measures,  models,  and  actions  of  the  model  adaptation  (the  inner  loop  in  this  figure). 
Biologically,  a  similarity  measure  corresponds  to  the  knowledge  instinct  and  its  changes  to 
emotions. 

Dynamic  fuzziness  control.  An  essential  part  of  dynamic  logic  is  a  match  between 
fuzziness  of  similarity  measures  and  uncertainty  in  model  parameters.  Initial  parameter  values 
could  be  wrong,  correspondingly,  initial  values  of  covariances  should  be  large.  The  subsequent 
dynamics  of  fuzziness  is  automatically  given  by  (6).  Accurate  estimation  of  covariances  matrixes 
by  (6)  requires  a  sufficient  amount  of  data.  If  the  amount  of  data  is  not  sufficient  for  accurate 
estimation  of  covariances,  the  following  regularization  procedure  is  recommended.  Define  an 
initial  large  value  of  covariances,  Clh,  corresponding  to  initial  errors  in  parameter  values,  and 
define  the  final  smallest  possible  value  for  covariances,  Coh,  according  to  sensor  errors  (or  other 
known  sources  of  errors.  Modify  (6)  as  follows, 

Ch  =  Clhexp(-P*it)  +  Coh  +  Ch.  (7) 

Here,  it  is  the  iteration  number,  [!  defines  the  speed  of  fuzziness  reduction,  which  should  be 
matched  to  the  speed  of  convergence,  and  C  is  expression  (6). 

Summary  of  the  DL  convergence.  During  an  adaptation  process  initial  fuzzy  and 
uncertain  models  (internal  structures  of  the  MF  system)  are  associated  with  structures  in  the 
input  signals,  fuzzy  models  are  getting  more  definite  and  crisp.  The  type,  shape  and  number  of 
models  are  selected  so  that  the  internal  representation  within  the  system  is  similar  to  input 
signals:  the  MF  concept-models  represent  structure-objects  in  the  input  signals.  An  example  of 
this  process  is  discussed  next. 
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Syntax,  semantics  and  inference  rules.  The  syntax  of  dynamic  logic  (the  structure  of  its 
statements  in  terms  of  its  elements)  is  given  by  the  structure  of  similarity  measure  (1).  The 
hierarchical  aspect  of  syntax  is  given  by  the  MFT  hierarchical  structure,  and  this  is  discussed 
later.  MFT  architecture  also  defines  interactions  and  relations  among  models,  which  is  also  an 
aspect  of  the  syntax  of  dynamic  logic.  Another  aspect  of  dynamic  logic  syntax  is  the  internal 
structure  of  models;  any  types  of  models  can  be  used  (analytic,  numeric,  logical  rules, 
probabilistic,  fuzzy,  etc.),  making  it  a  general  type  structure.  The  semantics  (the  meanings)  of 
dynamic  logic  have  the  following  aspects:  every  signal  obtains  its  meaning  as  a  part  of  the  model 
it  belongs  to  (i.e.,  the  model  it  has  a  high  similarity  with).  For  example,  pixels  associated  with 
the  model  “chair”  acquire  the  meanings  of  “chair.”  Similarly,  every  model  obtains  its  meaning  as 
a  part  of  a  higher-level  model  in  the  hierarchy.  This  is  the  cognitive  or  knowledge-related 
meaning,  it  pertains  to  the  structure  of  knowledge  and  is  independent  from  any  utilitarian  use;  for 
this  reason  it  is  called  aesthetic  as  different  from  utilitarian  (this  designation  is  further  discussed 
in  more  details  in  section  6).  Similarity  values  give  the  strength  of  associations,  they  evaluate  the 
certainty  of  knowledge.  Changes  in  the  similarity  values  are  aesthetic  emotions.  They  contain  the 
aesthetic  emotional  aspect  of  semantics.  The  semantic  field  (a  set  of  signals  or  models  with 
similar  meanings)  is  defined  by  signals  (or  models)  associated  with  the  same  higher-level  model. 
The  utilitarian  aspect  of  semantics  is  contained  in  relationships  between  cognitive  models  and 
behavioral  (action)  models.  For  example,  a  cognitive  model  “chair”  is  connected  to  behavioral 
model  “sit.”  Mathematical  structures  of  utilitarian  semantics  are  not  discussed  in  this  paper.  The 
inference  rules  of  dynamic  logic  specify  how  new  statements  are  derived  from  previously  known 
statements,  these  are  given  by  eqs.  (2)  through  (7).  An  important  feature  of  these  rules  is  the 
reduction  of  fuzziness  and  uncertainty  during  learning,  as  illustrated  in  an  example  in  the  next 
section. 


4.3  IMAGE  RECOGNITION,  TRACKING  AND  SENSOR  FUSION 

Consider  recognition  of  objects  in  images,  a  level  0  to  level  1  type  problem,  in  a  complex 
case,  when  object  signals  are  below  clutter  or  noise.  An  information-type  similarity  measure  is 
appropriate  for  this  case  [n'4°], 

L({X},{M})  =  ntZ  r(h)  l(X(n)  |  Mh(Ph,n))]X(n);  (8) 

neN  heH 


Here,  n  is  a  two-dimensional  index  enumerating  image  pixels,  and  X(n)  is  an  absolute  value  of 
the  image  intensity  in  pixel  n.  In  certain  cases  this  similarity  measure  can  be  interpreted  as 
mutual  information  in  models  {  Mh(Ph,n)  }  about  image  {  X(n)  }.  DL  equation  (2)  does  not 
change,  in  equations  (3,  4),  f(h|n)  is  changed  into  X(n)f(h|n),  and  N  stands  for  the  entire  power  in  the 
image,  N  =  X  X(n).  As  described  in  [  ],  we  define  models  accounting  for  image  intensity  as  follows. 
First,  an  object  pixel-model  is  defined,  Yh(Ph,k),  this  is  a  set  of  image  pixels  for  k  =  1...  K.  Then 
image  intensity  models  are  defined  as 

Mh(Ph,n)  =  X  I(Ph,k)  G(n|  Yh(Ph,k)).  (9) 

k 
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Here,  I(Ph,k)  specify  the  model  pixel  intensities  and  Gaussian  functions  are  used  for  smooth 
intensity  distributions.  The  model  is  relatively  insensitive  to  the  number  of  pixels  k  in  (9);  it 
should  be  selected  so  that  the  object  shape  is  adequately  represented;  also  since  the 
computational  complexity  is  proportional  to  the  number  of  pixels  in  the  models,  it  should  not  be 
excessive.  Convergence  properties  depend  on  the  number  of  the  model  parameters,  Ph,  rather 
than  on  the  number  of  pixels  k  used  in  this  model. 

Figure  2  shows  an  example  of  using  this  technique  for  finding  objects  below  noise  in  an 
image.  This  is  a  difficult  problem  because  detection  and  model  estimation  have  to  be 
accomplished  concurrently.  A  brute  force  fitting  of  multiple  hypotheses  to  the  data  would  lead  to 
combinatorial  complexity  (1030  to  1040  computer  operations,  vs.  108  operations  in  our  example). 
Because  of  this  combinatorial  complexity,  detection  usually  requires  high  signal  to  noise  ratio, 
so  that  detection  can  be  performed  separately  from  estimation.  In  this  example  the  improvement 
in  terms  of  signal  to  noise  ratio  is  about  two  orders  of  magnitude. 


Fig. 2.  Finding  ‘smile’  and  ‘frown’  patterns  in  noise,  an  example  of  dynamic  logic  operation:  (a) 
true  ‘smile’  and  ‘frown’  patterns  shown  without  noise;  (b)  actual  image  available  for  recognition 
(signal  is  below  noise,  signal-to-noise  ratio  is  between  -2dB  and  -0.7dB);  (c)  an  initial  fuzzy 
model,  the  fuzziness  corresponds  to  uncertainty  of  knowledge;  (d)  through  (h)  show  improved 
models  at  various  iteration  stages  (total  of  22  iterations).  At  stage  (d)  the  algorithm  tried  to  fit  the 
data  with  more  than  one  model  and  decided,  that  it  needs  three  models  to  ‘understand’  the 
content  of  the  data.  There  are  three  types  of  models:  one  unifonn  model  describing  noise  (it  is 
not  shown)  and  a  variable  number  of  blob-models  and  parabolic  models,  which  number,  location 
and  curvature  are  estimated  from  the  data.  Until  about  stage  (g)  the  algorithm  ‘thought’  in  terms 
of  simple  blob  models,  at  (g)  and  beyond,  the  algorithm  decided  that  it  needs  more  complex 
parabolic  models  to  describe  the  data.  Iterations  stopped  at  (h),  when  similarity  (1)  stopped 
increasing.  This  example  is  discussed  in  more  details  in  [41]. 


13 


In  this  example  there  are  three  parabolic-shape  ‘smile’  and  ‘frown’  objects  buried  in  the 
noise.  Several  types  of  models  are  used  to  describe  the  data:  one  uniform  model  describing  noise 
(it  is  not  shown)  and  a  variable  number  of  blob-models  and  parabolic  models,  which  number, 
location  and  curvature  are  estimated  from  the  data.  An  initial  fuzzy  model,  the  fuzziness 
corresponds  to  uncertainty  of  knowledge;  (d)  through  (h)  show  improved  models  at  various 
iteration  stages  (total  of  22  iterations).  At  stage  (d)  the  algorithm  tried  to  fit  the  data  with  more 
than  one  model  and  decided,  that  it  needs  three  models  to  ‘understand’  the  content  of  the  data. 
Until  about  stage  (g)  the  algorithm  ‘thought’  in  terms  of  simple  blob  models,  at  (g)  and  beyond, 
the  algorithm  decided  that  it  needs  more  complex  parabolic  models  to  describe  the  data. 
Iterations  stopped  at  (h),  when  similarity  (1)  stopped  increasing. 

MFT  described  above  can  be  used  for  target  tracking  with  appropriately  defined  models. 
Consider  a  case  of  predetected  data,  that  is,  sensor  data  are  first  compared  against  a  small 
threshold.  In  this  case  data  usually  are  a  set  of  coordinates  in  time,  so  that  every  data  point  n  is 
also  characterized  by  its  acquisition  time,  tn,  X(n,  tn).  This  problem  is  difficult  if  in  addition  to 
object  signals,  data  also  contain  a  large  number  of  clutter  data  points;  also  several  objects  might 
be  present.  In  these  complex  cases  target  detection  and  track  estimation  have  to  be  performed 
concurrently  (this  joint  problem  is  sometimes  called  ‘track-before-detect’).  Like  in  the  previous 
case  of  image  recognition,  signals  amplitude  does  not  convey  sufficient  information  about  target 
presence.  The  source  of  information  is  the  consistency  of  target  motion,  and  MFT  exploits  it  by 
using  the  track  model.  Appropriate  information  about  expected  target  trajectories  therefore  are 
necessary.  For  linear  target  motion  (constant  velocity),  the  linear  model  is 

Mh(Ph,  n,  tn)  =  Xh  +  Vh  (tn  -  to).  (10) 

Here  xh  is  the  target  h  position  at  time  to,  and  vh  is  the  target  velocity.  More  complicated  models 
can  be  used  as  needed  [42,43],  for  example,  Keplerian  models  should  be  used  for  satellites.  If  the 
model  is  accurate,  MFT  performs  the  maximum  likelihood  estimation  of  the  model  parameters 
and  therefore  often  attains  or  comes  close  to  the  best  possible  performance  as  given  by  the 
Cramer-Rao  Bound  (CRB)  for  the  joint  tracking  and  detection  [44]. 

Spatial  dimensions  of  sensor  data  might  be  less  then  that  of  the  model.  For  example, 
optical  sensors  measure  2-dimensional  angular  coordinates  (of  target  and  clutter  signals), 
whereas  the  requirement  may  include  estimation  of  3-dimensional  coordinates  in  (x,  y,  z). 
Modification  of  the  above  model  for  this  case  is  straightforward,  angular  positions  should  be 
expressed  as  functions  of  3-dimensional  coordinates.  Another  aspect  of  the  problem  is  whether 
2-dimensional  data  contain  sufficient  infonnation  for  estimating  3-dimensional  model.  This  is 
usually  possible  for  tracking  objects  on  Keplerian  trajectories  over  an  extended  period  of  time. 
Sometimes  it  is  possible  for  other  types  of  motion.  A  general  answer  to  the  question  if  the  track 
data  contain  sufficient  information  for  model  estimation  and  target  detection  can  be  answered 
using  the  joint  tracking  and  detection  CRB  [44].  An  interesting  and  practically  important  situation 
of  tracking  in  three  dimensions  with  optical  sensors  includes  sensor  fusion,  when  two  or  more 
sensors  are  used  for  tracking. 

MFT  is  equally  applicable  to  identifying  objects,  tracks,  and  situations  in  signals  from  a 
single  sensor  or  multiple  sensors  (sensor  fusion).  Again,  one  needs  to  use  models,  appropriate  for 
each  sensor.  Let  us  introduce  an  index  s  =  1...  S,  enumerating  sensors.  Instead  of  single  sensor 
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signals  {  X(n)  } ,  we  now  deal  with  multiple  sensor  signals,  {  X(s,n)  } .  Here,  n  enumerates  data 
from  each  sensor  separately,  and  it  is  appropriate  to  use  notation  ns.  To  simplify  notations,  we 
will  omit  this  sub-index.  It  is  important  to  remember  that  there  is  no  a  priori  association  among 
data  from  different  sensors;  although  n  always  stands  for  ns,  these  are  independent  indexes  and 
no  association  among  sensors  is  assumed.  MFT  for  sensor  fusion  requires  development  of 
models,  which  predict  signals  from  multiple  sensors;  instead  of  Mh(Ph,n)  we  have  to  use 
Mh(Ph,s,n).  For  example,  when  tracking  objects  on  linear  tracks,  the  same  model  (10)  (in  3- 
dimensions)  is  used  for  each  objects,  but  a  next  step  is  required,  the  model  for  each  sensor, 
Mh(Ph,s,n)  should  be  computed  from  (10)  by  using  appropriate  coordinate  transformation  for 
each  sensor.  Except  for  this  trivial  additional  step,  the  previous  formulation  is  applicable. 
Defining  similarity  by  analogy  with  likelihood,  as  in  (1),  according  to  the  basic  law  of 
probability, 

L({X},{M})=  n  n  X  r(h)  l(X(s,n)  |  Mh(s,n));  (11) 

s&S  neN  h^H 


Correspondingly,  the  only  change  in  equations  (2,  3,  4)  is  a  substitution  of  index  n  by  s,n. 
Association  variables,  f(h|s,n),  which  previously  associated  models  with  signals,  now  associate 
models  with  signals  from  multiple  sensors.  Applications  of  this  theory  to  the  problem  of  joint 
navigation  and  sensor  fusion  of  sensors  from  multiple  UAVs  are  considered  in  [45].  I’d 
emphasize  that  this  fonnulation,  if  required,  solves  the  joint  problem  of  concurrent  fusion, 
tracking,  navigation,  and  detection,  along  with  association  among  data  and  sensors. 


4.4  HIGH  LEVEL  FUSION 

High  level  fusion  requires  the  development  of  multilevel  architecture  and  high-level 
models.  High-level  models  for  situational  awareness  have  been  discussed  in  literature,  for 
example  [46].  Future  research  will  have  to  combine  these  type  models  with  lower  level  models 
described  above.  Here  we  define  a  general  architecture  for  multi-level  MFT  [l3].  The  previous 
sub-sections  described  a  single  processing  level  in  a  multi-level  MFT  system.  Input  signals  to 
each  level  are  activations  of  the  lower  level  models.  At  each  level  output  signals  are  activations 
of  the  models  recognized  at  this  level: 

a(h)  =  £  f(n|h).  (12) 


Output  signals  from  the  previous  level  become  input  signals  for  the  next  level.  In  general, 
a  higher  level  in  a  multilevel  system  provides  a  feedback  input  into  a  lower  level.  For  example, 
sensitivities  of  retinal  ganglion  cells  depend  on  the  objects  and  situations  recognized  higher  up  in 
the  visual  cortex;  or,  a  gaze  is  directed  based  on  which  objects  are  recognized  in  the  field  of 
view.  Similar  mechanisms  can  be  implemented  in  multi-level  MFT.  Fig.  3  illustrate  such  a  multi¬ 
level  MFT  system. 

Each  loop  of  operations  shown  at  each  level  in  the  above  figure  involves  multiple 
concept-models,  h  =  1,...  H.  To  some  extent  these  multiple  model-loops  are  independent,  yet 
some  models  interact  when  they  are  associated  with  the  same  input  signals.  Each  concept-model 
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is  an  intelligent  agent;  at  its  own  level  it  competes  with  other  agent-models  for  evidence  in  data. 
It  sends  its  activation  signal  to  the  higher  level.  It  may  activate  behavioral  models  (allocate 
processing  and  sensor  resources,  etc,  and  generate  behavior  directed  into  the  outside  world  - 
processes  not  contained  within  the  above  equations).  Each  model-agent  can  activate  adaptation 
mechanism;  each  model-agent  possesses  a  degree  of  autonomy  and  is  interacting  with  other 
agents.  Thus  MFT  is  an  intelligent  system  composed  of  multiple  adaptive  intelligent  agents. 
Each  agent,  interacts  with  the  similarity  measure  and  evokes  behavioral  response;  it  is  a 
continuous  loop  of  operations,  interacting  with  other  agents  from  time  to  time;  an  agent  is 
"dormant"  until  activated  by  a  high  similarity  value.  When  activated,  it  is  adapted  to  the  signals 
and  other  agents,  so  that  the  similarity  increases.  A  subset  of  data  in  input  signals  may  activate 
several  concepts-agents,  in  this  way  data  provide  evidence  for  the  presence  of  various  objects  (or 
situations).  Agents  compete  with  each  other  for  evidence  (matching  to  signals),  while  adapting  to 
the  new  signals. 


▲ 

Similarity  measures 


t 

Action/Adaptation 
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Models 
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Action/Adaptation 
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Fig. 3.  Multilevel  organization  of  the  MFT  system.  High  levels  of  similarity  measures 
correspond  to  concept-models  recognized  at  a  given  level  in  the  hierarchy;  these  are  the  input 
signals  to  the  next,  higher  level.  Also  concept-models  affect  behavior  (actions).  Models  at  a 
higher  level  are  more  general  than  models  at  a  lower  level,  like  situational  awareness  models  vs. 
track  models. 
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5.  INTEGRATING  LANGUAGE  AND  SENSOR  SIGNALS 


High  level  sensor  fusion  requires  integration  of  communications  and  sensors.  A  number 
of  MFT  models  have  been  developed  for  various  sensors,  and  used  for  sensor  fusion  and  for 
recognition  of  simple  situations  [n].  By  using  concept-models  with  multiple  sensor  modalities,  a 
MFT  system  can  integrate  signals  from  multiple  sensors,  while  adapting  and  improving  internal 
concept-models.  Similarly,  MFT  can  be  used  to  integrate  language  and  sensing.  This  requires  the 
development  of  language  MFT  models. 


5.1  LANGUAGE  MODELS 

Mathematical  techniques  previously  considered  for  describing  language  ability  suffer 
from  combinatorial  complexity  for  the  same  reason  that  cognitive  models  considered  previously. 
For  example,  Solomonoffs  methodology  [47]  is  combinatorial  in  computational  complexity 
because  of  it’s  relying  on  fonnal  logic.  Chomsky’s  original  ideas  of  Universal  grammar  [-0]  also 
relied  on  logical  rules;  his  second  proposal  [21]  relied  on  parametric  models.  Similarly 
combinatorially  complex  are  logical  tree  structures  considered  by  Pinker  [10],  Here,  we  consider 
a  non-combinatorial  mathematical  description  of  language  learning  and  usage  based  on  an 
extension  of  MFT.  We  argue  that  symbolic  abilities  require  joint  working  of  language  and 
cognition,  in  other  words,  understanding  of  sensory  data,  and  communication  about  this 
understanding.  Language,  like  MFT  is  a  hierarchical  system,  it  involves  sounds,  phonemes, 
words,  phrases,  sentences,  grammar...  and  each  level  operates  with  its  own  models. 
Development  of  these  levels  at  each  level  is  a  research  project,  which  is  added  by  a  number  of 
already  described  linguistic  models  [<)'10'12'48].  Here  I  discuss  an  approach  to  the  development  of 
models  of  phrases  from  words.  This  can  be  used  for  text  understanding;  for  example,  it  could  be 
used  for  an  understanding-based  search  engine.  The  input  data,  X(n),  in  this  “phrase-level”  MF 
language  system,  are  word  strings,  for  simplicity,  of  a  fixed  length,  P,  X(n)  =  {  wn+i,  wn+2... 
wn+p  }.  Here  wn  are  words  from  a  given  dictionary  of  size  K,  W  =  {wi,  W2...  wk},  and  n  is  the 
word  position  in  a  body  of  texts.  A  simple  phrase  model  is  “a  bag  of  word”  [49-50  51],  that  is,  a 
model  is  a  subset  of  words  from  a  dictionary,  without  any  order  or  rules  of  grammar, 

Mh(Ph,n)  =  {wh,i,  wh,2...  wh,p};  (13) 

the  parameters  of  this  model  are  its  words,  Mh(Ph,n)  =  Ph  =  {wh,i,  Wh,2...  Wh,p}.  The  language 
acquisition  project  in  this  simplified  context  consists  in  defining  models-concepts-phrases  best 
characterizing  the  given  body  of  texts  in  terms  of  a  similarity  measure. 

Conditional  partial  similarities  between  a  string  of  text,  X(n),  and  a  model  Mh  could  be 
defined  by  a  proportion  of  the  matches  between  the  two  sets,  X(n)  and  Mh,  l(n|h)  =  |X(n)(TMJ/ 
P.  Thus  similarity  (1)  is  defined  and  it  could  be  maximized  over  the  unknown  parameters  of  the 
system,  {  Ph  },  that  is,  over  the  word  contents  of  phrases.  This  would  result  in  learning  models- 
concepts-phrases,  accomplishing  the  goal  of  the  language  acquisition  project.  The  difficulty  of 
the  above  approach  is  that  the  dynamics  of  MFT  cannot  be  used  for  the  similarity  maximization, 
in  particular,  (3)  requires  evaluating  derivatives,  which  requires  a  smooth  dependence  of  models 
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on  their  parameters.  Without  dynamic  logic  of  MFT,  the  computational  complexity  of  this 
language  acquisition  project  becomes  combinatorial  ~  k(H*n*p>,  this  is  a  prohibitively  large 
number.  In  the  following  section  we  extend  dynamic  logic  to  this  type  of  models,  specified  as 
sets  of  qualitative  variables. 

5.2  DYNAMIC  LOGIC  OF  QUALITATIVE  SETS 

Developing  dynamic  logic  procedures  for  qualitative  sets,  like  phrase  bag  models, 
represents  a  principled  step  beyond  procedures  considered  above.  It  is  generally  important  for 
developing  multilevel  models,  because  input  signals  at  higher  levels  are  qualitative  variables, 
concepts,  recognized  at  lower  levels;  and  higher-level  models  are  sets  of  such  variables.  ‘Fitting’ 
uncertain  situational-awareness  models  to  uncertain  data  often  leads  to  combinatorial  complexity 
of  high-level  joint  learning  and  fusion.  Within  the  MFT  formulation  this  combinatorial 
complexity  is  related  to  a  need  to  maximize  a  similarity  measure  of  type  (1)  over  the  content  of 
qualitative  set  models.  If  this  maximization  were  attempted  by  ‘brute  force’  combinatorics,  it 
would  lead  to  a  combinatorial  complexity.  It  is  a  consequence  of  “logic-type”  similarity 
measure,  which  treats  every  potential  phrase-model  (every  combination  of  words)  as  a  separate 
logical  statement.  The  problem  can  be  solved  by  using  dynamic  fuzzy  phrase-contents,  as 
follows.  First,  define  fuzzy  conditional  partial  similarity  measures, 

l(n|h)  =  (27 ra h2)'s/2  exp  { -  0 . 5  ^  e(n,h,p)  2  /  ah2  } ,  (14) 

P 

where  e(n,h,s)  is  a  distance  (measured  in  the  numbers  of  words)  between  the  middle  of  the  word 
sequence  X(n),  that  is  n+P/2,  and  the  closest  occurrence  of  the  word  Wh,p;  the  sum  here  is  over 
words  belonging  to  the  phrase-model  h.  In  practical  implementations,  the  search  for  the  nearest 
word  can  be  limited  by  ±3ah  words,  and  e(n,h,p)  falling  outside  this  range  can  be  substituted  by  a 
(3ah+l).  The  dynamics  of  fuzziness  of  this  similarity  measure  is  given  by  a  modification  of  (6), 

c fh2=2  f(hln)  S  e(n,h,p)  2  /  Nh.  (15) 

n  p 

Second,  define  fuzzy  phrase-contents,  that  is  a  degree  of  the  word  Wh,p  “belonging”  to  a  model- 
phrase  h,  4>(p|h);  this  is  a  function  of  the  average  distance  of  the  word  Wh,P  from  the  phrase- 
model,  s(p,h) 

s(h,p)  =  ^  f(h|n)  e(n,h,p)  2  /  Nh;  (16) 

n 

<Kp|h)  =  p(h|p)/J]  p(h|p’);  p(h|p)  =  (27TGh2y1/2exp{-0.5  X  s(h,p)/ah2},  (17) 

p'eh  p 

The  dynamics  of  the  word  contents  of  the  phrase-models  is  given  by  modifying  P  (the  number  of 
words  in  phrases)  in  the  iteration  process,  say,  by  defining  Ph  ~  Pah,  or  by  requiring  4>(p|h)  to  be 
above  a  threshold  value,  and  keeping  in  each  phrase-model  words  satisfying  this  criteria.  The 
dynamics  defined  in  this  way  results  in  learning  phrase-models  (concepts)  and  accomplishes  the 
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goal  of  the  language  acquisition  project  without  combinatorial  complexity,  the  computational 
complexity  is  moderate,  ~  H*K*P~. 


The  “bag-of-word”  phrase  models  considered  above  are  simpler  than  tree-like 
dependencies  or  known  structures  of  natural  languages  [9'10‘12'13’48,52].  These  more  complicated 
“real”  linguistic  models  can  be  used  in  place  of  a  simple  distance  measure  e(n,h,p)  in  (15).  In  this 
way  the  models  of  noun  and  verb  phrases  and  tree  structures  can  be  incorporated  into  the  above 
formalism  of  MFT. 

5.3  JOINT  LANGUAGE  AND  SENSOR  MODELS 

Integration  of  language  and  sensor  cognition  in  MFT  is  attained  by  joint  language  and 
cognitive  models,  so  that  a  complete  concept-model  Mhis  given  by  [53'33,51] 

Mh={Mch,MLh};  (18) 

Here  M  h  denotes  cognitive  (sensory)  part  of  models  of  objects  and  situations  in  the  world,  like 
those  considered  in  section  4  and  MLh  is  a  language  part  of  the  model.  Consider  now  this 
integrated  model  as  the  mind’s  mechanism  of  integrating  language  and  cognition.  A  data  stream 
constantly  comes  into  the  mind  from  all  sensory  perceptions;  every  part  of  this  data  stream  is 
constantly  evaluated  and  associated  with  cognitive  models  using  fuzzy  dynamic  logic 
mechanism  described  in  previous  sections.  In  this  fuzzy  dynamic  association,  at  the  beginning 
the  models  are  fuzzy,  the  difference  between  language  models  and  other  models  for  sound 
signals  are  uncertain,  and  every  piece  of  data  is  associated  with  many  models,  linguistic  and 
cognitive.  Gradually,  models  are  adapted,  their  correspondence  to  specific  data  improve, 
selectivity  to  language  signals  and  non-language  sounds  is  enhanced.  Language  models  are 
associated  with  some  degree  of  specificity  with  words  (sentences,  etc.),  and  cognitive  models  are 
associated  with  objects  and  situations  of  perception  and  cognition.  Some  degree  of  association 
between  language  and  cognitive  models  occurs  before  any  of  the  model  attain  a  high  degree  of 
specificity  characteristic  of  our  conscious  concepts.  Certain  language  models  evolve  faster  than 
their  corresponding  cognitive  models  and  vice  versa.  Correspondingly  uncertainty  and  fuzziness 
of  the  two  aspects  of  integrated  models  may  significantly  differ.  Still,  existence  of  a  low-fuzzy 
linguistic  model  speeds  up  learning  and  adaptation  of  the  corresponding  cognitive  model  and 
v.v.  I  suggest  that  this  is  a  mechanism  of  interaction  between  language  and  cognition. 

The  described  mechanism  of  interaction  between  language  and  cognition  may  apply  to 
ontological  development  and  learning,  biological  specie  evolution,  and  evolution  of  cultures.  The 
differences  between  these  learning  and  evolution  processes  is  in  the  degree  of  specificity  of  a 
priori  models  (inborn,  or  accumulated  in  culture)  and  in  the  type  of  data  available  for  learning 
and  evolution.  For  example,  child  learning  occurs  in  parallel  in  three  realms:  (1)  language 
models  are  learned  to  some  extent  independently  from  cognition,  when  language  data  are 
encountered  for  the  first  time  with  limited  or  no  association  with  perception  and  cognition  (like 
in  a  newborn  baby);  (2)  similarly,  cognitive  models  can  be  learned  to  some  extent  independently 
from  language,  when  perception  signal  data  are  encountered  for  the  first  time  in  limited  or  no 
association  with  language  data;  and  (3)  language  and  cognitive  models  are  learned  jointly,  when 
language  data  are  present  in  some  association  with  perception  signals;  like  during  mother  talking 
to  a  baby:  “this  is  a  car”  (visual-perception-models  and  the  corresponding  language-word- 
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models  are  engaged  together);  another  example  is  more  complicated  conversations:  “Look  at 
Peter  and  Ann,  they  are  in  love”  (leads  to  learning  related  cognitive-models  and  phrase-models). 
A  significant  part  of  child  learning  (at  the  age  between  2  and  5)  consists  in  learning  language 
models  first;  it  would  take  a  significant  part  of  life  to  leam  cognitive  models  corresponding  to 
these  language  models.  Language  models  learned  first  stimulate  learning  of  cognitive  models. 

Development  of  high-level  fusion  systems  should  follow  similar  path.  Integration  of 
language  and  cognition  in  MFT  is  attained  by  characterizing  objects  and  situations  in  the  world 
with  two  types  of  models,  language  models  considered  above  and  cognitive  models  considered 
in  section  4  and  in  [7'u'46].  A  relatively  simple  system  can  use  bag-models  for  each  layer,  like 
‘bag  of  phrase’  model  for  the  next  level  of  concepts  (say,  sentence,  paragraph),  and  so  on. 
Alternatively,  more  realistic  language  models  of  sentences,  paragraphs  and  large  bodies  of  texts 
can  be  used  [54,55].  Such  integrated  MFT  system  learns  similarly  to  human,  in  parallel  in  three 
realms  as  described  above.  If  this  high-level  fusion  system  will  interacts  with  human  and 
computer  agents,  it  would  learn  language  and  cognition,  similarly  to  human  babies.  We  can  hope 
that  such  systems  will  improve  with  experience  and  not  become  obsolete. 

In  the  1960s  and  70s  artificial  intelligence  relied  on  “symbolic”  methods.  In  the  1980s 
intelligence  research  switched  to  exploring  a  variety  of  neural  paradigms.  Since  mid-1990s 
mixed  techniques  were  explored,  the  proper  tradeoffs  between  neural  and  symbolic  techniques 
received  a  great  deal  of  attention.  In  this  new  research,  however,  the  fundamental  limitation  of 
the  old  “symbolic”  methods  was  not  identified.  Namely,  it  was  not  analyzed  why  the  same  word 
“symbol”  is  used  for  trivial  objects,  like  traffic  signs,  and  for  culturally  significant  artifacts 
provoking  wars  and  piece,  like  Magen  David,  Cross,  or  Crescent.  We  suggest  the  reason  is  that 
the  word  “symbol”  is  used  for  two  different  meanings.  One  is  the  logically  defined  definite 
objects;  another  is  dynamic  processes  unifying  language  and  cognition.  A  mathematical 
description  of  these  dynamic  processes  is  proposed  in  this  section.  The  conclusion  is  that 
combining  axiomatic-logical  notation-signs  with  standard  neural  architectures  will  not  lead  to 
symbolic  ability.  The  promising  approach  to  describe  symbols  mathematically  is  outlined  above: 
symbols  are  adaptive  processes  combining  language  and  cognition.  These  symbols  are  initially 
domain-independent;  they  learn  specific  domain-dependent  infonnation  “on  their  own.”  In  the 
following  section  we  consider  psychological  interpretation  of  these  symbol-processes  and  relate 
the  above  mathematics  to  the  working  of  mind. 
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6.  DISCUSSION 


6.1  WHY  MIND,  INSTINCTS  AND  EMOTIONS? 

At  the  beginning  of  this  paper  I  summarized  some  justifications  for  following  biological 
examples  in  engineering  system  design.  Still,  often  one  can  hear  a  question:  Why  does  an 
engineer  need  to  know  about  concepts  and  emotions?  After  mathematical  equations  are  derived, 
why  not  just  use  them  for  developing  computer  code?  Why  should  an  engineer  be  concerned 
with  interpretations  of  these  equations  in  terms  of  instincts  and  emotions?  This  question  is 
profound  and  an  answer  can  be  found  in  the  history  of  science  and  engineering.  Newtonian  laws 
can  be  written  in  a  few  lines,  but  an  engineering  manager  cannot  hand  these  few  lines  to  a  young 
engineer  and  ask  her  to  design  an  airplane,  or  rocket.  Similarly,  Maxwell  equations  contain  the 
main  principles  of  radar  and  communication,  but  radars  and  communication  systems  cannot  be 
built  without  knowledge  of  electromagnetic  phenomenology.  For  the  same  reason,  MFT  and 
dynamic  logic  equations  need  to  be  supplemented  by  understanding  phenomenology  of  the  mind 
signal  processing  to  be  efficiently  applied  to  design  of  high  level  fusion  systems.  For  this  reason 
in  the  conclusion  of  this  paper  we  summarize  the  main  aspects  of  working  of  the  mind  as 
described  by  equations  given  in  this  paper. 

6.2  MFT  DYNAMICS 

Equations  in  section  4  describe  elementary  processes  of  perception  or  cognition,  in  which 
a  number  of  model-concepts  compete  for  incoming  signals,  model-concepts  are  modified  and 
new  ones  are  formed,  and  eventually,  more  or  less  definite  connections  [high  values  of  f(h|n), 
close  to  1]  are  established  among  signal  subsets  on  the  one  hand  and  some  model-concepts  on 
the  other,  accomplishing  perception  and  cognition. 

A  salient  mathematical  property  of  this  processes  ensuring  a  smooth  convergence  is  a 
correspondence  between  uncertainty  in  models  (that  is,  in  the  knowledge  of  model  parameters) 
and  uncertainty  in  associations  f(h|n).  In  perception,  as  long  as  model  parameters  do  not 
correspond  to  actual  objects,  there  is  no  match  between  models  and  signals;  many  models  poorly 
match  many  objects,  and  associations  remain  fuzzy  (between  0  and  1).  Eventually,  one  model 
(h')  wins  a  competition  for  a  subset  {n'}  of  input  signals  X(n),  when  parameter  values  match 
object  properties,  and  f(h]n)  values  become  close  to  1  for  ne{n'}  and  0  for  n£{n'}.  In  other 
words,  a  subset  of  data  is  recognized  as  a  specific  object  (concept).  Upon  convergence,  the  entire 
set  of  input  signals  {n}  is  divided  into  subsets,  each  associated  with  one  model-object, 
uncertainties  become  small,  and  fuzzy  concept-models  become  crisp  concepts.  Cognition  is 
different  from  perception  in  that  models  are  more  general,  more  abstracts,  and  input  signals  are 
the  activation  signals  from  concepts  identified  (perceived,  cognized)  at  a  lower  hierarchical 
level;  the  general  mathematical  laws  of  cognition  and  perception  are  similar  and  constitute  a 
basic  principle  of  the  mind  organization.  Kant  was  the  first  one  to  propose  that  the  mind 
functioning  involves  three  basic  abilities:  Pure  Reason  (concept-models),  Judgment  (emotional 
measure  of  correspondence  between  models  and  input  signals),  and  Practical  Reason  (behavior; 
we  only  considered  here  the  behavior  of  adaptation  and  learning)  [  ’  ’  ].  An  initial  “mapping” 
between  Kantian  theory  of  the  mind  and  MFT  was  outlined  in  ["].  We  now  briefly  discuss 
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relationships  between  the  MFT  and  concepts  of  mind  originated  in  psychology,  philosophy, 
linguistics,  aesthetics,  neuro-physiology,  neural  networks,  artificial  intelligence,  pattern 
recognition,  and  intelligent  systems. 

6.3  ELEMENTARY  THOUGHT-PROCESS,  CONSCIOUS,  AND  UNCONSCIOUS 

A  thought-process  or  cognition  involves  a  number  of  sub-processes  and  attributes, 
including  internal  representations  and  their  manipulation,  attention,  memory,  concept  fonnation, 
knowledge,  generalization,  recognition,  understanding,  meaning,  prediction,  imagination, 
intuition,  emotion,  decisions,  reasoning,  goals,  behavior,  conscious  and  unconscious  [1U4'62]. 
Here  and  in  the  following  subsections  we  discuss  how  these  processes  are  described  by  MFT. 

A  “minimal”  subset  of  these  processes,  an  elementary  thought-process,  has  to  involve 
mechanisms  for  afferent  and  efferent  signals  [14],  in  other  words,  bottom-up  and  top-down 
signals  coming  from  outside  (external  sensor  signals)  and  from  inside  (internal  representation 
signals).  According  to  Carpenter  and  Grossberg  [59]  every  recognition  and  concept  formation 
process  involves  a  “resonance”  between  these  two  types  of  signals.  In  MFT,  at  every  level  in  a 
hierarchy  the  afferent  signals  are  represented  by  the  input  signal  field  X,  and  the  efferent  signals 
are  represented  by  the  modeling  fields  Mh;  resonances  correspond  to  high  similarity  values 
l(n|h)  for  some  subsets  of  {n}  that  are  “recognized”  as  concepts  (or  objects).  The  mechanism 
leading  to  the  resonances  between  incoming  signals  and  internal  representations  is  given  by 
equations  in  section  4.  The  elementary  thought-process  also  involves  elements  of  conscious  and 
unconscious  processes,  imagination,  memory,  concepts,  instincts,  emotions,  understanding  and 
behavior  as  described  later. 

A  description  of  working  of  the  mind  as  given  by  the  MFT  dynamics  was  first  provided 
by  Aristotle  [6  ],  describing  cognition  as  a  learning  process  in  which  an  a  priori  form-as- 
potentiality  (fuzzy  model)  meets  matter  (sensor  signals)  and  becomes  a  fonn-as-actuality  (a 
logical  concept).  Jung  suggested  that  conscious  concepts  are  developed  based  on  genetically 
inherited  structures  of  the  mind,  archetypes,  which  are  inaccessible  to  consciousness  [61],  and 
Grossberg  [14]  suggested  that  only  signals  and  models  attaining  a  resonant  state  (that  is  signals 
matching  models)  reach  consciousness.  Fuzzy  uncertain  models  are  less  accessible  to 
consciousness,  whereas  more  crisp  and  certain  models  are  better  accessible  to  consciousness. 

6.4  IMAGINATION  AND  COGNITION 

Visual  imagination  involves  excitation  of  a  neural  pattern  in  a  visual  cortex  in  absence  of 
an  actual  sensor  stimulation  (say,  with  closed  eyes).  The  same  visual  cortex  neurons  that  serve 
for  perception  are  also  used  by  the  brain  for  imagination.  Carpenter  and  Grossberg  resonance 
model  [59]  and  the  MFT  dynamics  both  describe  imagination  as  an  inseparable  part  of  perception 
and  cognition:  imagined  patterns  are  top-down  signals  that  prime  the  perception  cortex  areas 
( priming  is  a  neural  terminology  for  making  neural  cells  to  be  more  readily  excited).  In  MFT, 
models  Mh  give  the  imagined  neural  patterns.  Perception  and  cognition  occur  as  a  match 
between  top-down  imaginations  and  bottom-up  signals  from  sensory  organs  or  from  model- 
concepts  recognized  at  lower  levels.  Kant  [57]  came  amazingly  close  to  explaining  the 
mechanisms  of  perception  and  cognition,  when  he  called  them  “a  play  of  cognitive  functions  of 
imagination  and  sensing”  [25]. 
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6.5  MIND  VS.  BRAIN 


Historically,  the  mind  is  described  in  psychological  and  philosophical  terms,  whereas  the 
brain  is  described  in  terms  of  neurobiology  and  medicine.  Within  scientific  exploration  the  mind 
and  brain  are  different  description  levels  of  the  same  system.  Establishing  relationships  between 
these  description  is  of  great  scientific  interest.  Today  we  approach  solutions  to  this  challenge 
[62],  which  eluded  Newton  in  his  attempt  to  establish  physics  of  “spiritual  substance”  [6  ]. 
General  neural  mechanisms  of  the  elementary  perception  and  cognition  (which  are  similar  in 
MFT  and  ART  [59])  have  been  confirmed  by  neural  and  psychological  experiments,  this  includes 
neural  mechanisms  for  bottom-up  (sensor)  signals,  top-down  “imagination”  model-signals,  and 
the  resonant  matching  between  the  two  [62’64].  Adaptive  modeling  abilities  are  well  studied  and 
adaptive  parameters  identified  with  synaptic  connections  [65];  instinctual  learning  mechanisms 
have  been  studied  in  psychology  and  linguistics  [  ’  ’  ];  identifying  neural  structures  responsible 
for  knowledge  and  language  instincts  is  a  next  challenge  for  the  neural  sciences. 

6.6  INSTINCTS  AND  EMOTIONS 

Functioning  of  the  mind  and  brain  cannot  be  understood  in  isolation  from  the  system’s 
“bodily  needs”.  For  example,  a  biological  system  (and  any  autonomous  system)  needs  to 
replenish  its  energy  resources  (eat);  this  and  other  fundamental  unconditional  needs  are  indicated 
to  the  system  by  instincts,  which  could  be  described  as  internal  sensors.  Emotional  signals, 
generated  by  this  instinct  are  perceived  by  consciousness  as  “hunger”,  and  they  activate 
behavioral  models  related  to  food  searching  and  eating.  In  this  paper  we  were  concerned 
primarily  with  the  behavior  of  perception,  cognition,  and  language  learning,  which  are  governed 
by  the  instincts  for  knowledge  and  language.  Other  instinctual  influences  modify  perception  and 
cognition  processes  in  such  a  way  that  desired  objects  “get”  enhanced  recognition.  It  can  be 
accomplished  by  modifying  priors,  r(h),  according  to  the  degree  to  which  an  object  of  type  h  can 
satisfy  a  particular  instinct.  Details  of  these  mechanisms  are  not  considered  in  this  paper. 

6.7  AESTHETIC  EMOTIONS  AND  INSTINCT  FOR  KNOWLEDGE 

Recognizing  objects  in  the  environment  and  understanding  their  meaning  is  so  important 
for  human  evolutionary  success  that  there  has  evolved  an  instinct  for  learning  and  improving 
concept-models  [<l7  25].  This  instinct  (for  knowledge  and  learning)  is  described  in  MFT  by 
maximization  of  similarity  between  the  models  and  the  world,  eq.  (1).  Emotions  related  to 
satisfaction-dissatisfaction  of  this  instinct  we  perceive  as  harmony-disharmony  (between  our 
understanding  of  how  things  ought  to  be  and  how  they  actually  are  in  the  surrounding  world). 
According  to  Kant  [57]  these  are  aesthetic  emotions  (emotions  that  are  not  related  directly  to 
satisfaction  or  dissatisfaction  of  bodily  needs).  Aesthetic  emotions  in  MFT  correspond  to 
changes  in  the  knowledge  instinct  (1).  The  aesthetic  emotion  is  negative,  when  new  input  signals 
do  not  correspond  to  existing  models.  The  mathematical  basis  for  the  theorem  in  section  4.2  can 
be  interpreted  psychologically:  during  dynamic  logic  iterations  the  aesthetic  emotion  is  always 
positive.  MFT  system  ‘enjoys’  learning. 

In  sections  4  we  considered  perception  and  cognition  concept-models  and  similarity 
measures;  using  them  in  (1)  yields  an  instinct  driving  the  MFT  system  to  improve  the  knowledge 
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about  the  world.  Similarly,  using  in  (1)  language  models  and  similarity  measures  considered  in 
section  5,  yields  the  MFT  system  improving  the  knowledge  of  language,  or  the  language  instinct. 
Combining  cognitive  and  linguistic  models  results  in  a  system  with  combined  linguistic  and 
thinking  abilities:  language  and  sensor  infonnation  together  help  adapting  both,  language  and 
cognitive  models.  This  is  the  main  mechanism  of  cultural  accumulation  and  transmission  of 
knowledge;  and  it  can  serve  as  a  foundation  for  knowledge  accumulation  and  transmission  in 
collaborative  multi-level  fusion  systems. 

6.8  COGNITION,  SIGNS  AND  SYMBOLS 

Signs  and  symbols  are  essential  for  working  of  the  human  mind,  for  accumulation  and 
transmission  of  knowledge  in  human  culture,  and  are  extensively  used  in  intelligent  and  multi¬ 
level  fusion  systems.  Scientific  theories  of  signs  and  symbols,  however,  are  not  well  developed 
and  even  the  exact  meaning  of  these  words  is  often  confused.  According  to  [<lS]  symbol  is  the 
most  misused  word.  We  use  this  word  in  trivial  cases  referring,  say,  to  traffic  signs  and  in  the 
most  profound  cases  of  cultural  and  religious  symbols.  In  mathematics  and  in  “Symbolic  AT’ 
there  is  no  difference  between  signs  and  symbols.  Both  are  considered  as  notations,  arbitrary 
non-adaptive  entities  with  axiomatically  fixed  meaning.  This  non-differentiation  is  a  “hangover” 
from  an  old  superstition  that  logic  describes  mind,  a  direction  in  mathematics  and  logical 
philosophy  that  can  be  traced  through  the  works  of  Frege,  Hilbert,  Russell,  to  its  bitter  end  in 
Godel  theory,  and  its  revival  during  the  1960s  and  1970s  in  artificial  intelligence.  Profound  use 
of  the  word  symbol  in  general  culture,  according  to  Jung,  is  related  to  symbols  being 
psychological  processes  of  sign  interpretation.  Jung  emphasized  that  symbol-processes  connect 
conscious  and  unconscious  [6  ],  Pribram  wrote  of  symbols  as  adaptive,  context-sensitive  signals 
in  the  brain,  whereas  signs  he  identified  with  less  adaptive  and  relatively  context-insensitive 
neural  signals  [69].  Deacon  [68]  thought  that  the  essence  of  the  human  symbolic  ability  is  two 
interacting  parallel  hierarchies,  like  described  in  sections  4  and  5  hierarchy  of  cognitive  models 
and  a  hierarchy  of  sign  (language)  models;  he  called  it  symbolic  reference. 

Combining  mathematical  developments  in  sections  4  and  5  with  the  above  discussion,  we 
reach  the  following  conclusion  for  consistent  meanings  of  signs  and  symbols  [25,51].  The  essence 
of  a  sign  is  that  it  is  an  arbitrary  notation,  which  can  be  interpreted  by  our  mind  or  by  an 
intelligent  system  to  refer  to  something  else,  to  an  object  or  situation.  Symbol  are  psychological 
processes  of  sign  interpretation,  they  are  equivalent  to  elementary  thought  processes  (section 
6.3),  and  they  integrate  unconscious  (fuzzy  models)  with  conscious  (crisp  models).  A  simple 
symbol  process  is  mathematically  described  by  a  single  MFT  level,  like  in  section  4.  A  complex 
symbol-process  of  cognition  of  culturally  important  concepts  may  take  hundreds  of  years  and 
involve  multiple  levels  of  MFT  or  the  mind  hierarchy.  Future  intelligent  systems  with  multiple 
levels  of  fusion,  future  sensor-webs,  will  be  designed  using  this  biological  knowledge.  They  will 
participate  in  human-computer  collaborative  networks.  They  will  integrate  learning  of  language 
with  learning  of  complex  cognitive  concepts.  They  will  integrate  communication  with 
information  fusion,  and  instead  of  quick  obsolescence,  their  perfonnance  will  improve  with  time 
and  experience  by  accumulating  knowledge  similar  to  human  cultures. 
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7.  SUMMARY 


This  paper  proposes  that  flexible  and  adaptive  high-level  fusion,  as  well  as  high-level 
cognition  is  not  possible  without  fusion  of  cognition  and  language.  High-level  cognition  as 
understanding  of  contents  of  sensor  signals  cannot  be  developed  separately  from  language-type 
communication.  One  reason  for  this  statement  is  biological  analogy,  only  the  human  mind  is 
capable  of  high-level  fusion,  cognition,  and  language;  and  the  human  mind  develops  these 
capabilities  jointly.  Another  reason  for  this  statement  is  mathematical,  high-level  models  or 
understandings  cannot  be  learned  (developed,  adapted)  directly  from  sensor  signals,  because 
they  are  not  grounded  directly  in  sensor  signals;  they  are  grounded  in  high-level  language- 
communication  messages.  Language,  in  turn,  is  grounded  in  mutual  understanding  among 
multiple  communication  agents.  Future  adaptive  high-level  fusion  systems,  therefore,  will  be 
collaborative  systems  involving  man  and  machine. 

The  paper  proposes  a  mathematical  approach  to  developing  such  integrated  sensor- 
communication  or  cognition-language  systems.  It  combines  PBIS  with  MFT.  The  main  elements 
include  phenomenology  of  the  human  mind  and  sensing  processes,  a  dual  hierarchy  of  MFT,  a 
cognitive  hierarchy  and  language  hierarchy.  At  each  hierarchical  level  there  are  models, 
similarity  measures,  and  the  dynamic  logic  mechanism  maximizing  similarity  between  the 
models  and  input  signals.  Input  signals  to  each  level  are  concept-models  recognized  at  a  lower 
level,  output  signals  are  new  concept-models.  In  addition  to  this  upstream  of  more  and  more 
complicated  concept-models,  there  is  a  downstream  of  adaptational,  attentional,  and  behavioral 
signals.  Cognitive  interpretation  of  this  structure  includes  models  as  mechanisms  of  conceptual 
understanding,  similarity  maximization  as  the  instinct  for  knowledge,  changes  in  similarity  as 
aesthetic  emotions.  An  important  aspect  of  dynamic  logic  operation  is  that  the  initial  states  of 
cognitive  and  language  models  are  vague  and  fuzzy  uncertainty  corresponding  to  absent  or 
uncertain  knowledge.  In  the  result  of  learning  and  adaptation,  uncertain  models  develop  into 
certain  crisp  or  probabilistic  knowledge.  In  this  process  all  available  infonnation  is  extracted 
from  signals  and  fused  jointly  in  sensory  and  communication  domains. 
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